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FUSION PROTEINS COMPRISING A FRAGMENT OF A CHAPERON POLYPEPTIDE 



The present invention relates to chaperone polypeptides which are active in the folding 
and maintenance of structural integrity of other proteins and the use thereof as fusion 
partners to assist in the expression of polypeptides in expression systems. The invention 
also relates to nucleic acids encoding chaperone polypeptides and fusion proteins as 
described, vectors comprising these nucleic acids, and host cells modified with the nucleic 
acids or vectors so as to express the fusion protein(s). 

Chaperones are in general known to be large multisubunit protein assemblies essential in 
mediating polypeptide chain folding in a variety of cellular compartments. Families of 
chaperones have been identified, for example the chaperonin hsp60 family otherwise 
known as the cpn60 class of proteins are expressed constitutively and there are examples 
to be found in the bacterial cytoplasm (GroEL), in endosymbiotically derived 
mitochondria (hsp60) and in chloroplasts (Rubisco binding protein). Another chaperone 
family is designated TF55/TCP1 and found in the thermophilic archaea and the 
evolutionary connected eukaryotic cytosol. A comparison of amino acid sequence data 
has shown that there is at least 50% sequence identity between chaperones found in 
prokaryotes, mitochondria and chloroplasts (Ellis R J and Van der Vies S M (1991) Ann 
RevBiochem 60: 321-347). 

A typical chaperonin is GroEL which is a member of the hsp60 family of heat shock 
proteins. GroEL is a tetradecamer wherein each monomeric subunit (cpn60m) has a 
molecular weight of approximately 57kD. The tetradecamer facilitates the in vitro folding 
of a number of proteins which would otherwise misfold or aggregate and precipitate. The 
structure of GroEL from E. coli has been established through X-ray crystallographic 
studies as reported by Braig K et al (1994) Nature 371: 578-586. The holo protein is 
cylindrical, consisting of two seven-membered rings that form a large central cavity which 
according to Ellis R J and Haiti F U (1996) FASEB Journal 10: 20-26 is generally 
considered to be essential for activity. Some small proteins have been demonstrated to 
fold from their denatured states when bound to GroEL (Gray T E and Fersht A R (1993) J 
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Mol Biol 232: 1197-1207; Hunt J F et al (1996) Nature 379: 37-45; Weissman J S et al 
(1996) Cell 84: 481-490; Mayhew M et al (1996) Nature 379: 420-426; Corrales F J and 
Fersht A R (1995) Proc Nat Acad Sci 92_: 5326-5330) and it has been argued that a cage- 
like structure is necessary to sequester partly folded or assembled proteins (Ellis R J and 
5 HartlFU {\996) supra. 

The entire amino acid sequence of E. coli GroEL is also known (see Braig K et al (1994) 
supra) and three domains have been ascribed to each cpn60m of the holo chaperonin 
(tetradecamer). These are the intermediate (amino acid residues 1-5, 134-190, 377-408 
10 and 524-548), equatorial (residues 6-133 and 409-523) and apical (residues 191-376) 
domains. 

Monomers of GroEL have been induced by urea or pressure, but they are inactive and 
have to reassociate to form the central cavity in order to facilitate the refolding of 
15 rhodanese (Mendoza J A et al (1994) J Biol Chem 269: 2447-2451; Ybarra J and 
Horowitz P M (1995) J Biol Chem 270: 22962-22967). 

GroEL facilitates the folding of a number of proteins by two mechanisms; (1) it prevents 
aggregation by binding to partly folded proteins (Goloubinoff P et al (1989) Nature 342: 
20 884-889; Zahn R and Pluckthun A (1992) Biochemistry 31.: 3249-3255), which then 
refold on GroEL to a native-like state (Zahn R and Pluckthun A (1992) Biochemistry 31 : 
3249-3255; Gray T E and Fersht A R (1993) J Mol Biol 232: 1197-1207); and (2) it 
continuously anneals misfolded proteins by unfolding them to a state from which 
refolding can start again (Zahn R et al (1996) Science 27L 642-645). Some mutations in 
25 the apical domain led to a decrease in polypeptide binding (Fenton W A et al (1994) 
Nature 371: 614-619), suggesting that this domain is involved in the binding of 
polypeptides. Electron microscopy suggests that denatured protein binds to the inner side 
of the apical end of the GroEL-cylinder (Chen S et al (1994) Nature 37L 261-264). The 
equatorial domain has been shown from the 2.4 A crystal structure of ATPyS-ligated 
30 GroEL (Boisvert D C et al (1996) Nature Structure Biology 3: 170-177) and mutagenesis 
studies (Fenton W A et al (1994) Nature 371: 614-619) to have the nucleotide binding 
sites. Binding and hydrolysis of ATP is cooperative (Bochkareva E S et al (1992) J Biol 
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Chem 267: 6796-6800; Gray T E and Fersht A R (1991) FEES Lett 292: 254-258). and 
lowers the affinity for polypeptides (Jackson G S et al (1993) Biochemistry 32: 2554- 
2563). Most of the intermodular contacts between the subunits of GroEL are between 
the equatorial domain. The intermediate domain connects the other two domains, 
5 transmitting allosteric effects (Braig K et al (1994) Nature 371.: 578-586; Bra.g K et al 
(1995) Nature Struct Biol 2: 1083-1094). 

The crystal structure of GroEL shows unusually high B-factors for the apical domain 
compared with the equatorial or intermediate domain, and the B-factors vary considerably 
10 within the domain (Braig K et al (1994) Nature 371, 578-586; Braig K et al (1995) Nature 
Struct Biol 2: 1083-1094; Boisvert D C et al (1996) Nature Structure Biology 3: 170- 
177). The 'high overall B-factor seems to result from a static disorder within the 
asymmetric unit and probably throughout the crystals of GroEL, and has been attributed to 
rigid-body movements generated by hinge-like ^-sheets in the intermediate domain. 
15 Regions of high flexibility have also been observed in the 2.8A structure of the co- 
chaperonin GroES (Hunt J F et al (1996) Nature 379: 37-45). A mobile loop has been 
shown to be directly involved in ADP-dependent binding to the apical domain (Landry S J 

et al (1993) Nature 364: 255-258). Binding of GroES leads to a conformational change of 

GroEL and a concomitant enlargement of the GroEL-cavity (Chen S et al (1994) Nature 
20 371: 261-264), in which the encapsulated polypeptide substrate can refold to a native-like 

state without the danger of aggregation (Martin J et al (1993) Nature 366: 228-233; 

Weissman J S et al (1995) Cell 83: 577-587). 

Monomeric forms of GroEL have been induced by site-directed mutagenesis and 
25 expressed and although these bind to rhodanese they do not affect its refolding (White Z 
W et al (1995) J Biol Chem 270: 20404-20409). 

Yoshida et al (1 993) FEBS 336: 363-367 report that a 34kD proteolytic fragment of E. 
coli GroEL which lacks 149 NH 2 -terminal residues and -93 COOH-terminal residues 
30 (GroEL 150-456) facilitates refolding of denatured rhodanese in the absence of GroES 
and ATP. Although the proteolytic fragment GroEL 150-456 elutes as a monomer during 
gel filtration, it still comprises the apical domain and significant portions of the 
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intermediate and equatorial domains, the latter of which determine the intersubunit 
contacts of GroEL (Braig K et al (1994) supra), thus allowing transient formation of the 
central cavity thereby accounting for the chaperonin activity which is observed. 

5 In any event, the mode of rhodanese refolding by GroEL 1 50-456 is very different from 
that brought about by the holo protein; the yield of productive refolding is low. folding is 
rapidly saturated with time, and it is not affected by GroES and ATP. Efficient release 
and folding requires the hydrolysis of ATP (Landry S J et al (1992) Nature 355: 455-457; 
Gray T E and Fersht A R (1992) FEBS Lett 282: 254-258; Jackson G S et al (1993) 

10 Biochemistry 32: 2554-2563; Todd M et al (1993) Biochemistry 32: 8560-8567.) 

EP-A-0 650 975 (NIPPON OIL CO LTD) discloses chaperonin molecules and a method 
of refolding denatured proteins using GroEL chaperonin 60 monomers (cpn60m) obtained 
from Thermits thermophilic The holo-chaperonin was first extracted and then purified 
15 from the bacterial source according to the method of Taguchi et al (1991) J Biol Chem 
266: 2241 1-22418. The cpn60m was then produced by treatment of the holo-chaperonin 
with trifluoroacetic acid (TFA) followed by reverse phase (rp) HPLC of the resulting 
denatured protein. A peak fraction containing the approximately 57kD cpn60m was 
obtained. The refolding activity of the cpn60m was assayed in solution by monitoring the 
20 regain in activity of inactivated rhodanese, which in specific activity terms amounted to 
about only 25% of the specific activity of the rhodanese prior to inactivation. When 
background spontaneous rhodanese refolding is subtracted then there is only an 
approximately 20% refolding activity. 

25 As well as cpn60m, EP-A-0 650 975 also discloses the use of an approximately 50kD N- 
terminal deletion fragment of cpn60m wherein the N-terminal amino acid residues up to 
(but not including) the Thr residue at position 79 are removed by proteolysis. This 50kD 
fragment showed an approximately 35% (about 30% when background is subtracted) 
rhodanese refolding activity when in solution. 



30 



Taguchi H et al (1994) J Biol Chem 269: 8529-8534 is a scientific report on which the 
invention of EP-A-0 650 975 is based. A transiently formed GroEL tetradecamer (the 
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compared with the sequence of Ams. Residues 307-423 were found to correspond 
substantially between Ams and GroEL. These residues comprise nearly equivalent 
portions of both the intermediate and apical domains of GroEL. 

5 The sequence alignments of Ams protein with the chaperonins noted above reveals a 
striking similarity (98%) between the amino-terminal four-fifths of Ams and the central 
part (approximately one-fifth) of E. coli GroEL chaperonin. The 50% sequence similarity 
between the Ams amino terminal region and the two other chaperonins is in line with the 
reported identity among the chaperonin family. The carboxy-terminal part of the Ams 

1 0 protein showed no similarity with chaperonins (<1 0% homology). 

International Patent Application W098/13496, the entire contents of which is 
incorporated herein by reference, describes fragments of chaperone molecules, termed 
minichaperones, which are effective in promoting the folding of unfolded or misfolded 
1 5 polypeptides. The fragments are monomeric in solution. 

Recombinant DNA technology has allowed industry to produce many proteins of 
commercial importance. Proteins are produced in a wide variety of expression systems 
which are based on, for example, bacterial, yeast, insect, plant and mammalian cells, one 

20 of the problems associated with the production of proteins by recombinant means is that 
host cells contain enzymes which degrade proteins and the presence of such enzymes 
present particular difficulties in the production of small polypeptides. Moreover, 
polypeptides produced by recombinant DNA technology are frequently at least partially 
incorrectly folded, such that yields of biologically active molecules vary according to the 

25 ability of the expression system to promote correct folding. This can moreover be 
problematic in the production of polypeptides destined for chemical and physical analysis, 
for which structural homogeneity is highly relevant. 

One approach to overcoming such difficulties is to express a recombinant protein of 
30 interest in the form of a fusion protein. DNA encoding the protein of interest is fused in- 
frame to a fusion partner protein and the resulting fusion is expressed. Often, a linker 



BNSDOCID <WO_ 



5 



10 



PCT/GB00/01981 

WO 00/75346 ^ 

^nenceencodingaprotease cleavage sitebetween the two pans ofthe fusion is included 
t0 aUovv cleavage of the fusion after it has been recovered from its host cell. 

The fusion partner protein is often one which may he recovered and purified by some 
form of high,y specific affinity purification means. Examples of such prote.ns are well 
Zn in the ar, and indude, for example, glutathione-S-transferase, maltose b.nd.ng 
protein and p-lactamase. 

However these fusion partner proteins are all relatively large and thus have a numher of 
^advantages. F or example, i, is essentia, to remove them hefore any mean ft, 

L function with any degree of independence. Many small polypeptides are s„.l thus made 
by chemical synthesis. 

15 Nummary of the Invention 

The present invention provides ftts.on proteins which incorporate chaperone fragments as 
JL partners to promote high yield expression of correctly foided po lypept m 
logical expression systems, ft has heen observed that consistently hrgher y.e, 
.eolnantly expressed polypeptides are obtained if the proteins are expressed as firsmns 
with a chaperone fragment. 

According to a first aspect of the invention, therefore, there is provided a firsion protein 
comprising: 

a firs, region comprising a fragment of a chaperone polypeptide; and 
b , a second region no. naturally associated with the first region compnsmg a 
polypeptide sequence of interest. 

30 The term "fusion protein" is used in accordance with its ordinary meaning in the ar, and 
refers to a single protein which is comprised of two or more regions wh.ch are denved 
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from different sources. Typically, a fusion protein is two proteins fused together by way 
of in-frame fusion of their respective nucleic acid coding sequences. 

A "chaperone fragment", as referred to herein, is any fragment of a molecular chaperone 
which possesses the ability to promote the folding of a polypeptide in vivo or in vitro. 
Preferred fragments are described in International patent application W098/13496, 
incorporated herein by reference. Especially preferred are fragments 191-375, 191-345 
and 193-335 of GroEL. Advantageously, the GroEL is E. coli GroEL, as further 
described below. 



The fusion protein according to the invention, in addition to the chaperone fragment, 
includes a desired polypeptide. The desired polypeptide is typically a polypeptide which 
it is desired to express by recombinant DNA techniques; it is expressed as a fusion with 
the chaperone fragment in order to increase the yield of correctly folded product, in 
accordance with the present invention. Many polypeptides may be expressed as fusion 
proteins according to the present invention. However, the expression of smaller 
polypeptides, up to about 250 amino acids in length, is preferred. Preferably, the 
polypeptides are between about 5 and about 100 amino acids in length. 

Advantageously, the polypeptide is a eukaryotic polypeptide, such as a mammalian 
polypeptide. 



Preferably, the fusion protein according to the invention comprises a cleavable linker 
between the first and second regions thereof. The linker, which is typically a polypeptide 
chain cleavable by a protease, or by other means suitable for effecting polypeptide 
cleavage, may be cleaved after production of the fusion protein in order to facilitate 
recovery of the desired polypeptide. 

Moreover, in the event that the fusion protein comprises the chaperone fragment and the 
desired polypeptide as separate chains, held together otherwise than by a peptide bond, the 
cleavable linker may comprise an alternative cleavable site, such as a disulphide bond. 
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Figure 2 shows an SDS-PAGE analysis of pHGro expression and purification systems. 
Molecular weight standards in the range 14,000 - 70,000 Daltons are loaded in lanes 5, 10 
and 15 (Sigma #SDS-7 Dalton Mark VII-I™). Analysis of the sonication extracts shows 

5 that the Tenascin (lane 1), RNase HI (lane 6) and FKBP 12 (lane 11) fusion proteins are 
all over-expressed to a high level. Following a three hour incubation with Nickel affinity 
resin, all visible traces of the fusion protein are removed from the Tenascin (lane 2), 
RNase HI (lane 7) and FKBP 12 (lane 12) sonication extracts. Tenascin, RNase HI and 
FKBP 12 fusion proteins are released into the elution buffers (lanes 3.8 and 13 

10 respectively). Thrombin successfully removes the GroEL fragment from Tenascin (lane 
4), RNase HI (lane 9) and FKBP 1 2 (lane 1 4). 

Detailed Description of the I nvention. 

15 A: First region. 

The first region of the fusion protein of the invention may comprise any natural or 
synthetic chaperone fragment. 
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Chaperone fragments suitable for use in the present invention are described, for example, 
in International patent application W098/13496, the disclosure of which is incorporated 
herein by reference. 

chaperone polypeptide having an amino acid sequence selected from at least amino acid 
residues 230-271 but no more than residues 150-455 or 151-456 of a GroEL sequence 
substantially as shown in SEQ. ID. No. 1, or a corresponding sequence of a substantially 
homologous chaperone polypeptide, or a modified, mutated or variant thereof having 
chaperone activity. 

The sequence of GroEL is available in the art, as set forth above, and from academic 
databases; however, GroEL fragments which conform to the database sequence are 
inoperative. Specifically, the database contains a sequence in which positions 262 and 
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Therearefou^^ 

suitable for use in the present invention. 
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with at least luM of said polypeptide at 25°C for at least 5 min, the resultant cyclophilin 
A activity being assayed by the method of Fischer G et al (1984) Biomed Biochim Acta 
43: 1101-1111. 

5 The polypeptide is preferably an hs P 60 polypeptide, preferably a GroEL polypeptide. 

A preferred polypeptide has the amino acid sequence 191-345 or 191-376, more 
preferably 193-335 or 191-337 of GroEL, or the equivalent residues of substantially 
homologous chaperonins, or a modified, mutated or variant sequence thereof. 

10 

The polypeptide preferably has a molecular weight of less than 34kDa. 

"Modifications" include chemically modified polypeptides for example. "Variants- 
include, for example, naturally occurring variants of the kind to be found amongst a 
1 5 population of hs P 60 chaperonin harbouring organisms/cells as well as naturally occurring 
polymorphisms or mutations. "Mutations" may also be introduced artificially by 
processes of mutagenesis well known to a person skilled in the art. 

In being "substantially homologous" peptides may have at least 50% amino acid sequence 
20 homology with the specified GroEL amino acid sequences, preferably at least 60% 
homology and more preferably 75% homology. Homology may of course also reside in 
the nucleotide sequences for the polypeptide which may be at least 50%, preferably at 
least 60% homologous and more preferably 75% homologous with the nucleotide 
sequence encoding the specified GroEL amino acid residues. 



25 



Where conservative substitutions are made they may be made by reference to the 
following table, where amino acids on the same block in the second column and 
preferably in the same line in the third column may be substituted for each other: 



PCT/GB00/01981 



ALIPHATIC 



Non-polar j 


GAP 




1LV 


Polar - uncharged 


C STM 




NQ 


Polar - charged 


DE 




RK 



AROMATIC 



OTHER 



HF W Y 



NQDE 



Symhetic variants of naturally-occurnng chaperone proteins may be made by standard 
recombinant DNA techniques. For example, site-directed mutagenesis may be used to 
introduce changes to the coding region of a DNA encoding a naturally-occurring coiled- 
5 coil protein. Where insertions are to be made, synthetic DNA encoding the insertion 
together with 5« and 3' flanking regions corresponding to the naturally-occurring sequence 
either side of the insertion site. The flanking regions will contain convenient restriction 
sites corresponding to sites in the naturally-occurring sequence so that the sequence may 
be cut with the appropriate enzyme(s) and the synthetic DNA ligated into the cut. The 
10 DNA is then expressed in accordance with the invention to make the encoded protein. 
These methods are only illustrative of the numerous standard techniques known in the art 
for manipulation of DNA sequences and other known techniques may also be used. 

The hs P 60 class of chaperonin proteins are generally homologous in structure and so there 
1 < are therefore conserved or substantially homologous amino acid sequences between the 
members of the class. GroEL is just an example of an hs P 60 chaperonin protein; other 
suitable proteins having an homologous apical domain may be followed. 

A fusion protein according to the invention will comprise as small a chaperone fragment 
20 as is feasible. This can be especially important where in structural determination of 
proteins by NMR it is often necessary to carry out isotopic labelling with 15 N or 1 C. This 
is expensive and with a long fusion partner much of the incorporated radioactivity is 
removed if the carrier protein (e.g. GST in many cases) is cleaved off. 
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B: Second Region. 

The second region of the fusion protein according to the invention may comprise any 
5 polypeptide sequence of interest which is not naturally associated with the first region. 
Usually this will mean that the sequence of interest will be found in nature encoded by a 
gene different from the gene encoding the first region. This may be determined easily by 
examining the sequences of the first and second regions against publicly available 
sequence databanks. The second region may be from the same species as the first region, 
1 0 or from a different species. It is also possible that the first and second regions are derived 
from portions of the same protein but are present in the fusion protein of the invention in 
a manner different from the natural protein sequence. 

The fusion protein according to the invention may be of any size although in general the 
15 invention is particularly useful when the polypeptide sequence of interest is short, e.g. 
from 2 to 100 amino acids in length, preferably 2 to 50 or even 2 to 30 or 5 to 10 ammo 
acids in size. However larger polypeptide sequences of interest, e.g. up 150, 200, 400 or 
1000 amino acids are also contemplated. The invention is particularly advantageous for 
the preparation of small polypeptides which are currently difficult to manufacture by 
20 recombinant means. Examples of such polypeptides include fragments of chaperone 
proteins, metabolic enzymes, DNA and RNA binding proteins, antibodies, viral proteins, 
intrinsic membrane proteins (including transport proteins from mitochondria, seven-helix 
receptor molecules, T-cell receptors), and cytoskeletal complexes, antibody binding 
peptides, peptide hormones (and other biologically active peptides made by ribosomal 
25 synthesis), and small subunits from multi-subunit biological structures such as respiratory 
enzymes, the ATP synthase. In general, the invention is suitable for use with peptides of 
any dimension, but the advantageous properties thereof are best exploited with small 
polypeptides, for example from 2 to 50 amino acids in length, particularly from 2 to 20 
amino acids in length, and preferably from 5 to 10 amino acids in length. 

A particular advantage of the present invention is that peptides may be produced by 
recombinant DNA technology which are so short that they would previously have been 
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made by oligopeptide synthesis techniques. Thus it is possib.e to produce libraries of 
peptides for example of mutants of biologically active peptides, which may be screened 
or otherwise analysed, cheaply and efficiently in recombinant express.on systems, 
particularly bacterial expression systems. 

C: Cleavable linker region. 

Where the first and second regions are linked by a cleavable linker region this may be any 
roion suitable for this purpose. Preferably, the cleavable linker regron is a protease 
cleavable linker, although other linkers, cleavable for example by small molecules, may 
be used These include Met-X sites, cleavable by cyanogen bromide, Asn-Gly, cleavable 
bv hydroxyzine, Asp-Pro, cleavable by weak acid and Trp-X cleavable by, in t er aUa, 
NBS-skatole. Protease cleavage sites are preferred due to the milder cleavage condmons 
necessary and are found in, for example, factor Xa, thrombin and collagenase. Any of 
th ese may be used. The precise sequences are available in the art and the skilled person 
will have no difficulty in selecting a suitable cleavage site. By way of example, the 
protease cleavage region targeted by Factor Xa is I E G R. The protease cleavage reg.on 
targeted by Enterokinase is D D D D K. The protease cleavage region targeted by 
Thrombin isLVPRG. 

D. Nucleic acids. 

The invention also provides nucleic acid encoding .he fusion proteins of the invention. 
These may be constructed using standard recombinant DNA methodologies. The nucle.c 
acid may be RNA or DNA and is preferably DNA. Where it is RNA, manipulations may 
be performed via cDNA intermediates. Generally, a nucleic acid sequence encodmg the 
first region will be prepared and suitable restriction sites provided at the 5' and/or 3' ends. 
Conveniently the sequence is manipulated in a standard laboratory vector, such as a 
plasmid vector based on pBR322 or pUC,9 (see below). Reference may be made to 
Molecular Cloning by Sambrook e, al (Cold Spring Harbor, ,989) or similar standard 
reference books for exact details of the appropriate techniques. 
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Nucleic acid encoding the second region may likewise be provided in a similar vector 
system. Sources of nucleic acid may be ascertained by reference to published literature or 
databanks such as Genbank. 

5 Nucleic acid encoding the desired first or second region may be obtained from academic 
or commercial sources where such sources are willing to provide the material or by 
synthesising or cloning the appropriate sequence where only the sequence data are 
available. Generally this may be done by reference to literature sources which describe 
the cloning of the gene in question. 

10 

Alternatively, where limited sequence data are available or where it is desired to express a 
nucleic acid homologous or otherwise related to a known nucleic acid, exemplary nucleic 
acids can be characterised as those nucleotide sequences which hybridise to the nucleic 
acid sequences known in the art. 

15 

Stringency of hybridisation refers to conditions under which polynucleic acids hybnds are 
stable. Such conditions are evident to those of ordinary skill in the field. As known to 
those of skill in the art, the stability of hybrids is reflected in the melting temperature 
(Tm) of the hybrid which decreases approximately 1 to L5°C with every 1% decrease in 
20 sequence homology. In general, the stability of a hybrid is a function of sodium ion 
concentration and temperature. Typically, the hybridisation reaction is performed under 
conditions of higher stringency, followed by washes of varying stringency. 

As used herein, high stringency refers to conditions that permit hybridisation of only those 
25 nucleic acid sequences that form stable hybrids in 1 M Na+ at 65-68 °C. High stringency 
conditions can be provided, for example, by hybridisation in an aqueous solution 
containing 6x SSC, 5x Denhardfs, 1 % SDS (sodium dodecyl sulphate), 0.1 Na + 
pyrophosphate and 0.1 mg/ml denatured salmon sperm DNA as non specific competitor. 
Following hybridisation, high stringency washing may be done in several steps, with a 
30 final wash (about 30 min) at the hybridisation temperature in 0.2 - O.lx SSC, 0.1 % SDS. 
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Moderate stringency refers to conditions equivalent to hybridisation in the above 
described solution but at about 60-62°C. In that case the final wash is performed at the 
hybridisation temperature in lx SSC, 0.1 % SDS. 

5 Low stringency refers to conditions equivalent to hybridisation in the above described 
solution at about 50-52°C. In that case, the final wash is performed at the hybridisation 
temperature in 2x SSC, 0.1 % SDS. 

It is understood that these conditions may be adapted and duplicated using a variety of 
10 buffers, e.g. formamide-based buffers, and temperatures. Denhardfs solution and SSC 
are well known to those of skill in the art as are other suitable hybridisation buffers (see, 
e.g. Sambrook, et al, eds. (1989) Molecular Cloning: A Laboratory Manual, Cold 
Sprino Harbor Laboratory Press, New York or Ausubel, et al, eds. (1990) Current 
Protocols in Molecular Biology, John Wiley & Sons, Inc.). Optimal hybridisation 
15 conditions have to be determined empirically, as the length and the GC content of the 
probe also play a role. 

Given the guidance provided herein, nucleic acids suitable for forming the first or second 
region of a fusion protein according to the invention are obtainable according to methods 
20 well known in the art. For example, a DN A of the invention is obtainable by chemical 
synthesis, using polymerase chain reaction (PCR) or by screening a genomic library or a 
suitable cDNA library prepared from a source believed to possess the desired nucleic acid 
and to express it at a detectable level. 

25 Chemical methods for synthesis of a nucleic acid of interest are known in the art and 
include triester, phosphite, phosphoramidite and H-phosphonate methods, PCR and other 
autoprimer methods as well as oligonucleotide synthesis on solid supports. These 
methods may be used if the entire nucleic acid sequence of the nucleic acid is known, or 
the sequence of the nucleic acid complementary to the coding strand is available. 

30 Alternatively, if the target amino acid sequence is known, one may infer potential nucleic 
acid sequences using known and preferred coding residues for each amino acid residue. 
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An alternative means to isolate the gene encoding the desired region of the fusion protein 
is to use PCR technology as described e.g. in section 14 of Sambrook et al., 1989. This 
method requires the use of oligonucleotide probes that will hybridise to the desired 
nucleic acid. Strategies for selection of oligonucleotides are described below. 

5 

Libraries are screened with probes or analytical tools designed to identify the gene of 
interest or the protein encoded by it. For cDNA expression libraries suitable means 
include monoclonal or polyclonal antibodies that recognise and specifically bind to the 
desired protein; oligonucleotides of about 20 to 80 bases in length that encode known or 
10 suspected cDNA encoding the desired protein from the same or different species: and/or 
complementary or homologous cDNAs or fragments thereof that encode the same or a 
hybridising gene. Appropriate probes for screening genomic DNA libraries include, but 
are not limited to oligonucleotides, cDNAs or fragments thereof that encode the same or 
hybridising DNA; and/or homologous genomic DNAs or fragments thereof. 

15 

A nucleic acid encoding the desired protein may be isolated by screening suitable cDNA 
or genomic libraries under suitable hybridisation conditions with a probe. 

As used herein, a probe is e.g. a single-stranded DNA or RNA that has a sequence of 
20 nucleotides that includes between 10 and 50, preferably between 15 and 30 and most 
preferably at least about 20 contiguous bases that are the same as (or the complement of) 
an equivalent or greater number of contiguous bases from a known or desired sequence. 
The nucleic acid sequences selected as probes should be of sufficient length and 
sufficiently unambiguous so that false positive results are minimised. The nucleotide 
25 sequences are usually based on conserved or highly homologous nucleotide sequences or 
regions of the desired protein. The nucleic acids used as probes may be degenerate at one 
or more positions. The use of degenerate oligonucleotides may be of particular 
importance where a library is screened from a species in which preferential codon usage 
in that species is not known. 

30 

Preferred regions from which to construct probes include 5' and/or 3' coding sequences, 
sequences predicted to encode ligand binding sites, and the like. For example, either the 
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full-length cDNA clone disclosed herein as SEQ. ID. No. 1 or fragments thereof can be 
used as probes, especially for isolating first region-encoding genes. Preferably, nucleic 
acid probes of the invention are labelled with suitable label means for ready detection 
upon hybridisation. For example, a suitable label means is a radiolabel. The preferred 

5 method of labelling a DNA fragment is by incorporating a 3 "P dATP with the Klenow 
fragment of DNA polymerase in a random priming reaction, as is well known in the art. 
Oligonucleotides are usually end-labelled with Y 32 P-labelled ATP and polynucleotide 
kinase. However, other methods (e.g. non-radioactive) may also be used to label the 
fragment or oligonucleotide, including e.g. enzyme labelling, fluorescent labelling with 

1 0 suitable fluorophores and biotinylation. 

After screening the library, e.g. with a portion of DNA including substantially the entire 
desired sequence or a suitable oligonucleotide based on a portion of said DNA, positive 
clones are identified by detecting a hybridisation signal; the identified clones are 
15 characterised by restriction enzyme mapping and/or DNA sequence analysis, and then 
examined to ascertain whether they include DNA encoding a complete polypeptide (i.e., if 
they include translation initiation and termination codons). If the selected clones are 
incomplete, they may be used to rescreen the same or a different library to obtain 
overlapping clones. If the library is genomic, then the overlapping clones may include 
20 exons and introns. If the library is a cDNA library, then the overlapping clones will 
include an open reading frame. In both instances, complete clones may be identified by 
comparison with the DNAs and deduced amino acid sequences provided herein. 

It is envisaged that the nucleic acid of the invention can be readily modified by nucleotide 
25 substitution, nucleotide deletion, nucleotide insertion or inversion of a nucleotide stretch, 
and any combination thereof. Such mutants can be used e.g. to produce a mutant that has 
an amino acid sequence differing from the sequences as found in nature. Mutagenesis 
may be predetermined (site-specific) or random. A mutation which is not a silent 
mutation must not place sequences out of reading frames and preferably will not create 
30 complementary regions that could hybridise to produce secondary mRNA structure such 
as loops or hairpins. 
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The foregoing methods may. of course, be applied to the identification and modification 
or generation of sequences useful in any part of the fusion protein of the invention. In 
particular, the sequence of the IF, polypeptide coiled coil provided herein as SEQ. ID. 
No. 1, or suitable fragments thereof as discussed above, may be used as a probe for the 
5 identification of further suitable sequences. 

The first or second region may also be manipulated to introduce an appropriate restriction 
enzyme site at the terminus which is to be linked to the nucleic acid encoding the first 
region via a corresponding restriction enzyme site. Desirably the sites will be either the 
10 same or at least have matching cohesive ends. Of course, the first and second regions 
may be joined by alternative means; for example, first region may be incorporated into 
primers used to isolate or replicate the second region. 

Where a protease cleavable linker region is required, this maybe introduced into the 
1 5 linked first and second regions (e.g. into the restriction site linking the two) or introduced 
into one or the other prior to their combination. 



E. Expression vectors and host cells. 
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The nucleic acid encoding a fusion protein according to the invention, or constituent 
part(s) thereof, can be incorporated into vectors for further manipulation. As used herein, 
vector (or plasmid) refers to discrete elements that are used to introduce heterologous 
DNA into cells for either expression or replication thereof. Selection and use of such 
25 vehicles are well within the skill of the artisan. Many vectors are available, and selection 
of appropriate vector will depend on the intended use of the vector, i.e. whether it is to be 
used for DNA amplification or for DNA expression, the size of the DNA to be inserted 
into the vector, and the host cell to be transformed with the vector. Each vector contains 
various components depending on its function (amplification of DNA or expression of 
DNA) and the host cell for which it is compatible. The vector components generally 
include, but are not limited to, one or more of the following: an origin of replication, one 
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or more marker genes, an enhancer element, a promoter, a transcription termination 
sequence and a signal sequence. 

Both expression and cloning vectors generally contain nucleic acid sequences that enable 
5 the vector to replicate in one or more selected host cells. Typically in cloning vectors, 
these sequences enable the vector to replicate independently of the host chromosomal 
DNA, and includes origins of replication or autonomously replicating sequences. Such 
sequences are well known for a variety of bacteria, yeast and viruses. The origin of 
replication from the plasmid pBR322 is suitable for most Gram-negative bacteria, the 2u 
10 plasmid origin is suitable for yeast, and various viral origins (e.g. SV 40, polyoma, 
adenovirus) are useful for cloning vectors in mammalian cells. Generally, the origin of 
replication component is not needed for mammalian expression vectors unless these are 
used in mammalian cells competent for high level DNA replication, such as COS cells. 

1 5 Most expression vectors are shuttle vectors, i.e. they are capable of replication in at least 
one class of organisms but can be transfected into another organism for expression. For 
example, a vector is cloned in E. coli and then the same vector is transfected into yeast or 
mammalian cells even though it is not capable of replicating independently of the host cell 
chromosome. DNA may also be replicated by insertion into the host genome. However, 
20 the recovery of genomic DNA encoding the fusion protein of the invention is more 
complex than that of exogenously replicated vector because restriction enzyme digestion 
is required to excise the DNA. DNA can be amplified by PCR and be directly transfected 
into the host cells without any replication component. 

25 Advantageously, an expression and cloning vector may contain a selection gene also 
referred to as selectable marker. This gene encodes a protein necessary for the survival or 
growth of transformed host cells grown in a selective culture medium. Host cells not 
transformed with the vector containing the selection gene will not survive in the culture 
medium. Typical selection genes encode proteins that confer resistance to antibiotics and 

30 other toxins, e.g. ampicillin, neomycin, methotrexate or tetracycline, complement 
auxotrophic deficiencies, or supply critical nutrients not available from complex media. 
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As to a selective gene marker appropriate for yeast, any marker gene can be used which 
facilitates the selection for transformants due to the phenotypic expression of the marker 
gene. Suitable markers for yeast are, for example, those conferring resistance to 
antibiotics G418, hygromycin or bleomycin, or provide for prototrophy in an auxotrophic 
yeast mutant, for example the URA3, LEU2, LYS2. TRP1, or HIS3 gene. 

Since the replication of vectors is conveniently done in E. coli, an E. coli genetic marker 
and an E. coli origin of replication are advantageously included. These can be obtained 
from E. coli plasmids, such as pBR322, Bluescript© vector or a pUC plasmid, e.g. 
pUC18 or pUC19, which contain both E. coli replication origin and E. coli genetic 
marker conferring resistance to antibiotics, such as ampicillin. 



Suitable selectable markers for mammalian cells are those that enable the identification of 
cells competent to take up vector nucleic acid, such as dihydrofolate reductase (DHFR, 

15 methotrexate resistance), thymidine kinase, or genes conferring resistance to G418 or 
hygromycin. The mammalian cell transformants are placed under selection pressure 
which only those transformants which have taken up and are expressing the marker are 
uniquely adapted to survive.. In the case of a DHFR or glutamine synthase (GS) marker, 
selection pressure can be imposed by culturing the transformants under conditions in 

20 which the pressure is progressively increased, thereby leading to amplification (at its 
chromosomal integration site) of both the selection gene and the linked DNA that encodes 
the fusion protein. Amplification is the process by which genes in greater demand for the 
production of a protein critical for growth, together with closely associated genes which 
may encode a desired protein, are reiterated in tandem within the chromosomes of 

25 recombinant cells. Increased quantities of desired protein are usually synthesised from 
thus amplified DNA. 

Expression and cloning vectors usually contain a promoter that is recognised by the host 
organism and is operably linked to the fusion-protein encoding nucleic acid. Such a 
30 promoter may be inducible or constitutive. The promoters are operably linked to DNA 
encoding the fusion protein by removing the promoter from the source DNA by restriction 
enzyme digestion and inserting the isolated promoter sequence into the vector. Both the 
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aat.ve promote, sequence of one of .he concerns of the fus.on prote.n tmd man 
h e,ero.o g ous promoters may be used to direct ampUftcation and/or express.on of the 
Z The L "operabiy linked" to a imposition wherein the consents 

scribed are in a relationship permittin g them to function in their intended manner. A 
1, seance "operab.y linked" to a codin g sconce is >i g ated in such a M « 

cession of the cod,„ g science is ach.eved under condit.ons compat.ble w.th the 

control sequences. 

Pr0 mot=rs suitabie for use with prokaryotic hosts .nclude, for example, the p-lactamase 
0 and .actose promoter systems, alkaHrte phosphatase, the tryptophan (tip) promo = 
and hybrid promoters such as the tac promoter. Their nucieotide sequences have been 
Tub. L, Lby enab,in g the skiUed worker operab.y to ,i g ate them to DNA encod, 
rliol protein usin g linkers or adaptors to suppiy any reared restnctton _ 
Promoters for use in bacteria, systems will also g enera lly contam a Smne-Dei.amo 
,5 sequence operably linked to the DNA encoding the fusion protem. 

Preferred expression vectors are bacteria, expression vectors which comprise a promoter 
" acterio^e such as pha g ex or T7 which is capable of * ~ 

In „„e of the most wideiy used expression systems, the nucleic ac.d 
,0 protein may be transcribed from the vector by T7 RNA polymerase (Stud.er al 
MeZdsinBn.ymo, 18 5; 60-B9, ,990). in the E. coU BO HDE3) host strarr .used 
lection with pET vectors, me T7 RNA polymerase is produced from the ,-lyso g cn 
Z in the host bacterium, and its expression is under the control of the .PTG m uc.b e 
^UVS promoter. Tbis system has been employed success*.,* for over-prodncuo 
„ L y g ,ol,ar proteins, bu, in many other cases si g nif.can, over-product, on carmo, be 
Z v L because of the toxicity of over-expression (Studier „ „,,1990 ; Geor g e et al, 
Biol 235- 4,4-435, .994) . Alternatively the polymerase g ene may be mtroduced 
. a lambda pha, by infection with art int- pha g e such as the CEo 
commercially available (Nova g en, Madison, USA), other vectors mclude 
ILn g the lambda PL promoter such as PLEX (Invitro g en, NT) , vectors — 
II promoters such as pTrcHisXpressTm (mvmogen) or P Trc99 (Pharmaca B.otech, 
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SE) , or vectors containing the tac promoter such as pKK223-3 (Pharmacia Biotech) or 
PMAL (New England Biolabs, MA. USA). 

Moreover, the fusion protein gene according to the invention may include a secretion 
5 sequence in order to facilitate secretion of the polypeptide from bacterial hosts, such that 
it will be produced as a soluble native peptide rather than in an inclusion body. The 
peptide may be recovered from the bacterial periplasmic space, or the culture medium, as 
appropriate. 

10 Suitable promoting sequences for use with yeast hosts may be regulated or constitutive 
and are preferably derived from a highly expressed yeast gene, especially a 
Saccharomyces cerevisiae gene. Thus, the promoter of the TRP1 gene, the ADHI or 
ADHII gene, the acid phosphatase (PH05) gene, a promoter of the yeast mating 
pheromone genes coding for the a- or a-factor or a promoter derived from a gene 
15 encoding a glycolytic enzyme such as the promoter of the enolase, glyceraldehyde-3- 
phosphate dehydrogenase (GAP), 3-phospho glycerate kinase (PGK), hexokinase, 
pyruvate decarboxylase, phosphofructokinase, glucose-6-phosphate isomerase, 3- 
phosphoglycerate mutase, pyruvate kinase, triose phosphate isomerase, phosphoglucose 
isomerase or glucokinase genes, the S. cerevisiae GAL 4 gene, the S. pombe nmt 1 gene 
20 or a promoter from the TATA binding protein (TBP) gene can be used. Furthermore, it is 
possible to use hybrid promoters comprising upstream activation sequences (UAS) of one 
yeast gene and downstream promoter elements including a functional TATA box of 
another yeast gene, for example a hybrid promoter including the UAS(s) of the yeast 
PH05 gene and downstream promoter elements including a functional TATA box of the 
25 yeast GAP gene (PH05-GAP hybrid promoter). A suitable constitutive PH05 promoter is 
e.g. a shortened acid phosphatase PH05 promoter devoid of the upstream regulatory 
elements (UAS) such as the PH05 (-173) promoter element starting at nucleotide -173 and 
ending at nucleotide -9 of the PH05 gene. 



30 



Fusion protein gene transcription from vectors in mammalian hosts may be controlled by 
promoters derived from the genomes of viruses such as polyoma virus, adenovirus, 
fowlpox virus, bovine papilloma virus, avian sarcoma virus, cytomegalovirus (CMV), a 
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retrovirus and Simian Vims 40 (SV40), from heterologous mammalian promoters such as 
th e actin promoter or a very strong promoter, e.g. a ribosoma, protein promoter and from 
th e promoter norma.iy associated with the gene encoding a component of the fuston 
protein, provided such promoters are compatible with the host cell systems. 

Transcriptton of a DNA encoding the fusion protetn by h.gher eukaryotes may be 
increased by inserting an enhancer sequence into the vector. Enhancers are relatively 
orientation and posttion independent. Many enhancer sequences are .mown from 
mammahan genes (e.g. elastase and globin). However. typically one w„l 
enhancer from a eukaryotic ce.l virus. Examples include the SV40 enhancer on the la e 
side of the replication origin (bp 100-270) and the CMV early promoter enhancer. The 
enhancer may be sp.iced into the vector a. a positron 5' or 3' to the coding sequence, but , 
preferably located at a site 5« from the promoter. 



15 
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Advantageous*, a eukaryotic expression vector encoding the fusion protem may 
comprise a .ocus control region (LCR). LCRs are capable of directing htgh-level 
integration site independent expression of transgenes integrated into host eel, chromaun, 
wh ich is of importance especially where the fusion protein gene is to be expressed tn the 
context of a permanently-transfected encode eel, Hue in which chromosomal 
integration of the vector has occurred, in vectors designed for gene therapy apphcafons 
or in transgenic animals. 

An expression vector includes any vector capable of expressing nudeic acids that are 
operative* linked with regulatory sequences, such as promoter regions, that are capable 
of expression of such DNA, Thus, an expression vector refers to a recombinant DNA or 
RNA construct, such as a plasmid, a phage, recombinant virus or other vector, that upon 
introduction into an appropriate host cell, results in expression of the Coned DNA^ 
Appropriate expression vectors are well known to those with ordinary skill tn the art and 
include those that are rep.icable in eukaryotie and/or prokaryotie cells and t ose that 
, remain episomal or those which integrate into the host eell genome. For example, 

encoding the ftrsion protein according to the invention may be inserted into a vector 
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suitable for expression of cDNAs in mammalian cells, e.g. a CMV enhancer-based vector 
such as pEVRF (Matthias, et ai, (1989) MAR 17, 6418). 

Particularly useful for practising the present invention are expression vectors that provide 
for the transient expression of DNA encoding the fusion protein in mammalian cells. 
Transient expression usually involves the use of an expression vector that is able to 
replicate efficiently in a host cell, such that the host cell accumulates many copies of the 
expression vector, and, in turn, synthesises high levels of fusion protein. For the purposes 
of the present invention, transient expression systems are useful e.g. for identifying 
fusion protein mutants, to identify potential phosphorylation sites, or to characterise 
functional domains of the protein. 

Construction of vectors according to the invention employs conventional ligation 
techniques. Isolated plasmids or DNA fragments are cleaved, tailored, and religated in 
the form desired to generate the plasmids required. If desired, analysis to confirm correct 
sequences in the constructed plasmids is performed in a known fashion. Suitable methods 
for constructing expression vectors, preparing in vitro transcripts, introducing DNA into 
host cells, and performing analyses for assessing expression and function are known to 
those skilled in the art. Gene presence, amplification and/or expression may be measured 
in a sample directly, for example, by conventional Southern blotting, Northern blotting to 
quantitate the transcription of mRNA, dot blotting (DNA or RNA analysis), or in situ 
hybridisation, using an appropriately labelled probe based on a sequence provided herein. 
Those skilled in the art will readily envisage how these methods may be modified, if 
desired. 

The invention moreover provides an expression vector comprising a first nucleic acid 
sequence encoding a polypeptide capable of forming a coiled coil structure operably 
linked to a promoter capable of expressing the first nucleic acid sequence in a host cell, 
and, linked to the nucleic acid sequence, a cloning site permitting the insertion of a second 
) nucleic acid sequence such that it is capable of being expressed in fusion with the first 
nucleic acid sequence. Such a vector is a useful vehicle for expressing nucleic acids 
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encoding any desired polypeptide in the form of a fusion protein according to the 
invention. 

A further embodiment of the invention provides host cells transformed or transfected with 
5 the vectors for the replication and expression of polynucleotides of the invention. The 
cells will be chosen to be compatible with the vector and may for example be bacterial, 
yeast, insect or mammalian. 

Such host cells such as prokaryote, yeas, and higher eukaryote cells may be used for 
,0 replicaung DNA and producing the fusion protein. Suitable prokatyotes include 
eubacteria, such as Gram-negative or Gram-positive organisms, such as E. col,, e.g. E. 
coll K-12 strains, DH5ce and HB101, or Bacilli. Further hosts suitable for fusion protein 
encoding vectors include eukaryo.ic microbes such as filamentous fungi or yeast, e.g. 
Saccharomyees cerevisiae. Higher eukaryotic cells include insect and vertebrate cells, 
,5 particularly mammalian cells. In recent years propagation of vertebrate cells in culture 
(tissue culture) has become a routine procedure. Examples of useful mammalian host cell 
lines axe epithelial or fibroblastic cell lines such as Chinese hamster ovary (CHO) cells, 
N1H 3T3 cells, HeLa cells or 293T cells. The host cells referred to in this disclosure 
comprise cells in in vitro culture as well as cells that are within a host animal. 

DNA may be stably incorporated into cells or may be transiently expressed using methods 
known in the art. Stably transfected mammalian cells may be prepared by transfecttng 
cells with an expression vector having a selectable marker gene, and growing the 
transfected cells under conditions selective for cells expressing the marker gene. To 
25 prepare transient transfectants, mammalian cells are transfected with a reporter gene to 
monitor transfection efficiency. 

To produce such stably or transiently transfected cells, the cells should be transfected with 
a sufficient amount of fusion protein-encoding nucleic acid to form the fuston pro.e.n 
30 The precise amounts of DNA encoding the fusion protein may be empirically determmed 
and optimised for a particular cell and assay. 
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Host cells are transfected or, preferably, transformed with the above-captioned expression 
or cloning vectors of this invention and cultured in conventional nutrient media modified 
as appropriate for inducing promoters, selecting transforms, or amplifying the genes 
encoding the desired sequences. Heterologous DNA may be introduced into host cells by 

5 any method known in the art, such as transfection with a vector encoding a heterologous 
DNA by the calcium phosphate coprecipitation technique or by electroporation. 
Numerous methods of transfection are known to the skilled worker in the field. 
Successful transfection is generally recognised when any indication of the operation of 
this vector occurs in the host cell. Transformation is achieved using standard techniques 

1 0 appropriate to the particular host cells used. 

Incorporation of cloned DNA into a suitable expression vector, transfection of eukaryotic 
cells with a plasmid vector or a combination of plasmid vectors, each encoding one or 
more distinct genes or with linear DNA, and selection of transfected cells are well known 
15 in the art (see, e.g. Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, 
Second Edition, Cold Spring Harbor Laboratory Press). 

Transfected or transformed cells are cultured using media and culturing methods known 
in the art, preferably under conditions, whereby the fusion protein encoded by the DNA is 
20 expressed. The composition of suitable media is known to those in the art, so that they 
can be readily prepared. Suitable culturing media are also commercially available. 

Preferred bacterial hosts which may be used in the method of the invention include B 
strains of E coll such as BL21 or a K strain such as JM109. These strains are widely 
available in the art from academic and/or commercial sources. The B strains are deficient 
in the Ion protease and other strains with this genotype may also be used. Preferably the 
strain should not be defective in recombination genes. 

Most preferably the strain is BL21(DE3), as disclosed in Studier et al. (1990). Bacteria 
obtainable by selection for improved heterologous polypeptide expression, optionally 
cured of the original vector, may also be used as host cells in the present invention. 
Particular bacteria include E. coll C43 (DE3) (deposited at the European Collection of 
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Cell Cultures (ECCC) . Salisbury. Wiltshire. UK on 4th July 1996 as B9607044D): £. colt 
CO 14CDE3, deposited a, the National Col.ections of mdustria, and Marine Bactena on 
; un[ 997 as NOME 40BB4); E. coU DK8(DE3)S (deposited a, the Nauona, 

Cle -»^>«-«^^^ 
5 L C41(DE3) (deposited a, the ECCC on 4,h My 1996 as B96070444). Such hactena, 
In cu ed, provide a host for the expression of fhsion proteins of the invent.on and are 
especially suitable for the expression of fusion proteins whose expression ,s tox,c to 
bacteria. 

10 F. Production of fusion proteins and their processing. 

Host cells of the invention may be cultured under conditions in which expression of the 

example affinity chromatography or HPLC. Where small ff,sion protetns are mvolved 
1 5 HPLC is particularly suitable. 

X.e firsion protein may be cleaved, e,. using an appropriate 

polypeptide sequence of interest and this sequence may be recovered from the resulttng 
mixture of first and second regions of the fusion proton. 



20 



^tentatively the fusion protein may find application as such, for example as an 
—gen where the coiled-coils form aggregates. This avoids the accessor 
preparin immunogenic materia, form small proteins and peptides by eoup.tng themb 
lepLe chemical reaction to a carrier protein such as Way-hole limpet hemocyanm 
25 (KLH). 



H. Use in NMR studies 



30 



Fus ,o„ proteins according to the invention possess an extremely small fus.on panne, 
experiment without the fusion parmer interfering in the spectrum rece.ved. 
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NMR analysis may be performed according to techniques and methodology which are 
known in the art, for example as described in K. Wiirtrich, "NMR of Proteins and Nucleic 
Acids". Wiley, New York, 1986, incorporated herein by reference. 



The present invention is illustrated with reference to the following examples. 



Preparation of a GroEL fusion vector 

The polymerase chain reaction (PCR) is used to generate a DNA fragment containing a N- 
terminal histidine tag, the 191-345 fragment of GroEl, a thrombin cleavage site and a multiple 
cloning site. The 5'- flanking PCR primer is 5'- AGA CGG ACT GCC ATA TGC ATC ATC 
ATC ATC ATC ATG AAG GTA TGC AGT TCG ACC - 3\ The 3'- flanking primer is 5'- ATT 
GAC CCC AAG CTT CGA ATT CCA TGG TAC CAG CTG CAG ATG TCG AGC TCG GAT 
CCA CGC GGA ACC AGA CCA CGG CCC TGG ATT GCA GCT TCT TCA CCC -3'. The 
template for the PCR amplification is as described in Zahn et al, (1996) PNAS (USA) 93:15024- 
15026. The resulting fragment is cloned into Nde I and Hind m digested PRSETA (Invitrogen) 
to create pHGro (see fig. 1 ). 

A Fibronectin type ffl domain of human Tenascin and human FKBP 12 are sub-cloned into 
pHGro using BamH 1 and EcoR I. Residues 2-62 of S. cerevisiae RNase HI are amplified 
from genomic DNA by PCR and subcloned via BamH\ and £coRI restriction sites into the 
pRSETa vector (Invitrogen), which also contains a fragment of the GroEL chaperone 
protein and a histidine tag. The sequences of the primers used for PCR amplification are 
as follows; 

5' GCACCTAGGCGTTCCGTTCCCTTGAAGATGCGC (forward) 



5'-GGGAATTCAGGAACTTCCATAGTTAGATGTAGTATTTGG (reverse). 
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, w ,Wer J E (1996) J. Mol Biol 260, 289 - 298) wh.ch ,s used for the expre 
ht ^xTY^nedium. ^ransformants are obtained using a polyethylene glycol method (Chung, 
(1989) Proc. Natl. Acad. Sci. MM 86,2172-2175). 
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Fxample 2 

Expression and purification of fusion proteins 

■ • n ^ litre of 2XTY medium plus ampicillin at 50 ug per ml is 
a o . itre S hake flask containing 0.25 litre ot/Aii r 

B-D-thiogalactoside (IPTG), usin c h- sodium 
20 hours after induetion hy centriftigation and re-suspende m, ^ 

.. ~ uiitiUmM NaCl + 10 mM p-mercaptoethanol + VM^r < 
phosphate buffer pH 7.2 +150 mM N ^ ^ ^ 

concentration). The suspense ,s son.cated on power 

j ^..lcfc: and 3 seconds cooling on ice tor a rowi ui 

sonication buffer, re-sonicated and re-centnfuged. 

„„r The centrifueed protein solutions are 
Purification is performed in a ba,ch-w,se manner. The — . ; 

■a A in ml of Ni 1 * charged iminodiacetic acid resin (Sigma) .s added, 
combined and ,0 m f N, . ^ ^ ^ ^ ^ „ 

Stirr£d Tso I ofTo 1 TLa base / Triztna HCI (Sigma) buffer pH .„ ♦ .0 mM 

roer cap,oethanol. Centnfuga.ion » ^ 2 50 mM Imidazole is used 

processis repea,edforeachpHO^u son. 5 ^ ^ + ^ ^ fi _ 

t0 elute the fusion proteins from for „ and Wase HI. U is 

m erca P ,oe,hanol is used for human FKBPI2, pH .4 ^ ^ rf 

necessary to include .50 mM NaCl during the e.utton of ^ ^ ^ 

^ • ja~a to FKRP 12 and 50 units to the other t*o. Alter 

Thrombin (Sigma) « added to FKBP 12 an ^ ^ Q 

roo m temperature the purifications are analysed by SDS-PA^ g ^ 
■ ^_ .^^ i^R - 379") Protein concentrations are 

rrr=r-*-- - — -— — 

Serum Albumin is used to produce the calibration curve. 
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The test pro.eins are produced to an average of 400 mg per litre of culture, which is 
approximately 30% of the total soluble protein. AH three fusion proteins behave in a typ.cal 
manner during metai affinity chromatography, and thrombin removes the GroEL fragmen 
successfully in each case (see FIG.2). Tenascin and RNase HI oniy require a small quantity of 
.hrombin for the complete removal of GroEL. .n the case of FKBP 12. a small amount of fus.on 
protein remains after the treatment with thrombin. This has also been experienced w«h other 
FKBP 12 fusion proteins where thrombin has been used and is to be expected. 

The 191-345 apical fragmen. of GroEL with a N-terminal histidine tag satisfies the criteria for a 
good fusion protein. It can be over-expressed to high levels as soluble fusion proteins in E,o„ ,. 
is small and can be purified easily using nickel affinity chromatography. Being monomer.c, ,1ns 
expression system does no, suffer from the problems associated with the express.on of 
multimeric proteins with dimeric fusion proteins. 



Example 3 



NMR sample preparation 

Uniform label.ing of proteins with «N. or 15 N and "C is achieved by growing oel.s in 
minimal media containing 'Wl or »C 6 -glucose as nitrogen and carbon sources 
respectively A 10 % 1J C-labelled protein is produced by incorporating 10 % C 6 -glucose 
and 90 % unlabeled glucose into the grow* medium. Half litre cultures are grown a. 28 
•C 250 rpm shaking, to an optical density of 0.2 AUs at 600 nm. Protein express.on ,s 
induced for 16 h with 0.2 mM isopropyl-D-thiogalactoside and harvested cells are 
25 resuspended in 16 mM Na 2 HP0 4 , 4 mM NaH,P0,H 2 O, 150 mM NaCl and 10 mM p- 
mercaptoethanol. Cells are subject to two rounds of sonication and cell lysates are 
centrifuged at 17,000 r.p.m. for 30 min. The supernatant is applied to a nickel affimty 
column (Sigma) and the fusion protein elated with 50 mM Tris-HC. P H 8.4, 150 mM 
NaCl 10 mM p-merca P to=,hanol and 250 mM imidazole. Thrombin digestton of the 
30 fusion protein, using 5 U thrombin per ml protein, released the RNase HI fragment from 
the GroEL tag fragment. This is carried out for 2 h at room temperature. The RNase HI 
is purified from the GroEL fragmen. using a Heparin HyperD column (Sigma) w,th a 
gradient of I M NaCl (0-100%) in 50 mM Tris-HCl P H 8.4 and 10 mM P- 
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ro ercaptoethano.. RNase HI containing fractions arc diabysed overnight against 50 mM 
acetate buffer pH 3.6 and 5 mM DTT, and concentrated in an Amicon concentrator. 

N MR samples contained approximateiy 2 mM protein in 50 mM acetate buffer P H 3.6 
and 5 mM DTT, in either H 2 0 with 10 % D 2 0 or 100 % D 2 0. 
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Claims 



1 A fusion protein comprising: 

5 a) a first region comprising a fragment of a chaperone polypeptide; and 

b) a second region not naturally associated with the first region comprising a 
polypeptide sequence of interest. 

2. A fusion protein according to claim 1 which further comprises a cleavable linker 
1 0 region between the first and second regions. 

3. A fusion protein according to claim 1 or 2 wherein the first region is at or 
proximal to the N-terminus of the protein. 

15 4 . A fusion protein according to any one of the preceding claims wherein the 
polypeptide sequence of interest is from 2 to 250 amino acids in length. 

5. A fusion protein according to any one of the preceding claims wherein the 
polypeptide sequence of interest is of eukaryotic origin. 

20 

6. A fusion protein according to any one of the preceding claims wherein the first 
region comprises a polypeptide selected from the group consisting of residues 191-375, 
191-345 and 193-335 of£. coli GroEL 



25 7. A 
claims. 



nucleic acid encoding the fusion protein according to any one of the preceding 



8. An expression vector comprising the nucleic acid of claim 7 operably linked to a 
promoter. 

30 

9. An expression vector comprising a first nucleic acid sequence encoding a 
fragment of a chaperone polypeptide operably linked to a promoter capable of expressing 
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the first nucleic acid sequence in a host cell and, linked to the nucleic acid sequence, a 
cloning site permitting the insertion of a second nucleic acid sequence such that it is 
capable of being expressed in fusion with the first nucleic acid sequence. 

5 10. A host cell transformed with the expression vector of claim 8 or claim 9. 

11. A method of preparing a fusion protein comprising: 

(i) transforming a host cell according to claim 1 0, which method comprises culturing 
10 the host cell under conditions which provide for the expression of the fusion protein from 

the expression vector within the host cell; and 

(ii) recovering the fusion protein. 

12. A method according to claim 1 1 wherein the host cell is E. coli. 

15 

13. A method according to claim 12 wherein the expression vector comprises a 
bacteriophage T7 promoter. 

14 A method according to any one of claims 11 to 13 wherein the fusion protein 
20 further comprises a protease cleavable linker region between the first and second regions 
and which method further comprises cleaving the protein at the protease cleavable linker 
and recovering the second region. 

15. A polypeptide when prepared by the method of any one of claims 11 to 14. 

25 

16. Use of a polypeptide capable of forming a coiled coil structure as a fusion partner 
in the construction of a fusion protein. 

17. Use according to claim 16, wherein the fusion protein is a fusion protein according 
30 to any one of claims 1 to 7. 

18. Use of a fusion protein according to any one of claims 1 to 6 in NMR studies. 
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soluble, native human proinsulin in E. coli. We fused proinsuhn to the C-terminus of the ' f^^faTS? 

2S -P otosuM by addition of L-ar.inine „ ..Hanoi .o ,he oulture medio.,. The 
reserved. 

Keywords: Proinsulin; Secretory expression; DsbA fusion protein; Medium additives; Arginine 

ously increasing number of patients. Conse- 
1. Introduction quently, the requirement for recombinant insulin 

Nearly 0.7% of the world population suffers is increasing. Insulin is a 51 amino acid polypep- 
from in ulin-dependent diabetes with a continu- tide, consisting of two separate chams Both- £ 

and B-chain are connected by two intermolecular 

"T7 a- .1. r Tei- +49*15 5524860- fax- +49- disulfide bonds. The A-chain contains an addi- 

MsSlvon. tional intramolecular disulfide bond. In the hu- 

E-mail address: rudolph@biochemtech.uni-halle.de (R. man p anC reas, insulin is produced as a single 

Rudolph). 
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polypeptide chain, proinsulin, with the A-chain 
and B-chain joined by the connecting peptide 
(C-peptide) (Mackin, 1998). After synthesis, 
folding and concomitant disulfide bond forma- 
tion, native proinsulin is converted to insulin by 
enzymatic cleavage whereby the C-peptide is 
cleaved off and the carboxy -terminus of the B- 
chain is trimmed. Processing of proinsulin in 
vitro can be performed with trypsin and car- 
boxypeptidase B (Kemmler et al., 1971). 

Mature insulin is so far not producible in 
procaryotic host cells in its native conformation 
by recombinant techniques, essentially because 
correct disulfide bond formation only occurs at 
the level of proinsulin. Several strategies for 
production of proinsulin in the cytoplasm of Es- 
cherichia coli as insoluble inclusion bodies have 
been described and are commercially used (Tang 
and Hu, 1993; Kang and Yoon, 1991; Sung et 
al., 1986). The great advantage of these ap- 
proaches is that proinsulin can be produced in 
high amounts, but the complex process of purifi- 
cation and the formation of the correct disulfide 
bonds during folding are critical cost factors. 
Alternatively, proinsulin has been produced in 
E. coli and secreted by routing the recombinant 
protein using appropriate signal sequences to the 
periplasmic space. In this case, however, the 
yieLd of correctly folded proinsulin was very low 
compared to the yield obtained by intracellularly 
production of proinsulin (Talma dge et al., 1981; 
Chan et al., 1981). Also alternative host organ- 
isms, such as Bacillus subtilis, Strep tomyces livi- 
dans, and Saccharomyces cereviseae were used 
for secretory expression of the proinsulin gene 
(Novikov et al., 1990; Roller et al., 1989; Thirn 
et al., 1986), but the yield was not significantly 
increased in comparison to that obtained in E. 
coli. The first report of an efficient secretory ex- 
pression of a modified proinsulin gene in E. coli 
has been published by Kang and Yoon (1994). 
The authors constructed a so-called ZZ-proin- 
sulin fusion construct in which the C-peptide 
was either totally deleted or drastically short- 
ened (only 1-11 amino acids remained), which 
resulted in significantly increased expression 
yields. 



The stability against degradation of a recom- 
binant protein produced in E. coli is strongly 
dependent on its folding efficiency in vivo. 
Therefore, the bacterial periplasm principally 
represents the most favourable compartment for 
proinsulin production. The periplasm provides 
oxidising conditions and proteins of the Dsb 
family (DsbA, DsbC and DsbG), which are effi- 
cient catalysts of disulfide bond formation in the 
periplasm (Bessette et al., 1999; Raina and Mis- 
siakas, 1997). DsbA is the most important ox- 
idase of free sulfhydryl groups in the periplasm. 
It has been shown to be useful for in vitro fold- 
ing of disulfide-bonded proteins, such as BPTI, 
hirudin, a-lactalbumin, alkaline phosphatase, 
and bovine ribonuclease A (Zapun and 
Creighton, 1994; Wunderlich et al, 1993; 
Akiyama et al., 1992). A previous study in 
which enterokinase was fused to the C-terminus 
of DsbA showed that high amounts of soluble, 
native enterokinase can be produced in the 
periplasm (Collins-Racie et al., 1995). In con- 
trast, coexpression of DsbA only yielded 
periplasmic IGF-I inclusion bodies (Joly et al., 
1998). In the case of the secreted a-amylase/ 
trypsin inhibitor RBI the yield was not im- 
proved by DsbA coexpression alone but could 
be strongly increased in combination with the 
addition of reduced glutathione to the cultiva- 
tion medium (Wunderlich and Glockshuber, 

1993). 

In this study we investigate the periplasmic 
production of proinsulin in E. coli as a C-termi- 
nal fusion to DsbA. The rationale of this ap- 
proach is the assumption that DsbA should 
stabilise unfolded proinsulin intramolecularly via 
its polypeptide binding site (Freeh et al., 1996) 
and promote correct disulfide bond formation 
by intra- and intermolecular catalysis. We show 
that DsbA-proinsulin is efficiently secreted to 
the periplasm and that the proinsulin moiety in 
soluble fusion protein is correctly folded. Fur- 
ther significant improvements of production of 
the fusion protein could be obtained by optimi- 
sation of the bacterial growth conditions and by 
addition of low molecular weight folding en- 
hancers to the cultivation medium. 
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2. Materials and methods 

2.1. Bacterial strains and plasmids 

The E. coli strains used in this study are XLI 
blue (Stratagene) as host strain for routine 
cloning experiments, and the expression strains 
SF131 (Meerman and Georgiou, 1994), BL21 
(Novagen), JM109 (Stratagene), C600, and 
RB791 (both E. coli Genetic Stock Center, New 
Haven). The plasmid pDsbA3 (Jonda et al., 1999) 
was used as cloning vector (Fig. 1). Vector pRK5- 
PI (Mather and Ullrich, 1987) was the source of 
the proinsulin gene. 

2.2. DNA constructions 

Plasmid preparation was carried out using a 
Qiagen plasmid miniprep kit according to the 
manufacturers instruction. Restriction enzyme di- 
gestion and ligation were carried out according to 



Sambrook et al. (1989). The vector pDsbA3-PI 
was constructed by inserting the human proin- 
sulin gene into pDsbA3 downstream of the dsbA 
gene. Additionally, a sequence of 18 nucleotides 
was introduced between the 3' end of dsbA and 
the 5' end of the proinsulin gene, encoding a 
linker with five glycine and one arginine residue 
(Fig. 1). The human proinsulin gene was amplified 
using plasmid pRK5-PI as template, the forward 
primer 5'-d(GAAATATC-TGTCCGAGAAAA 
AAGGCGGTGGGGGTGGCCGCTTTGTGA- 
ACCAACACCTG), and the reverse primer 5'- 
d(CTCTTGATGACGTTGATCCCTAGGCTT- 
AAGCAC). The dsbA gene was amplified from 
pDsbA3 with the forward primer 5'-d(G- 
CGACTGGAATTCCATATGGCGCAGTAT- 
GAAGATG), and the reverse primer 5'-d(CTT- 
TATAGACAGGCTCTTTTTTCCG-CCA). PCR 
was carried out using Tfl polymerase (Promega) 
at 95, 55, and 72°C for denaturation, annealing, 
and synthesis, with 30 cycles. The am- 



pDsbA3-PI 



dsbA \ proinsulin 



5 TCC GAG AAA AAA GGC GGT GGG GGT GGC CGC TTT GTG AAC CAA 
S e r ,86 Glu ,87 Lys ,88 Lys' 8 ' Gly GlyGly GlyGly Arg Phe'^Val' 97 AsiT Gin" 
Fig 1 Map of the vector pDsbA3-Pl. The restriction sites used for cloning are indicated. The linker sequence between DsbA 
(residues 1-189) and proinsulin (residues 196-281) are shown in italics. 
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plified PCR products were used as templates for 
the following fusion PCR. First, to allow anneal- 
ing of homologeous regions at the 3' end of dsbA 
gene and the 5'end of proinsulin gene, both PCR 
products were incubated in a PCR reaction with- 
out primer at an annealing temperature of 40°C, 
with five cycles. The next 25 cycles were carried 
out at an annealing temperature of 62°C after 
addition of 5'-forward primer used for dsbA-PCR 
and 3'-backward primer used for proinsulin-PCR. 
The amplified PCR product was cloned between 
the AscI and BamHI sites of P DsbA3. The DNA 
sequence encoding the entire fusion protein was 
determined using a sequencing Kit (Biozym) and 
a Li-COR sequencer (MWG Biotech). 

2.3. Expression of DsbA -proinsulin 

E. coli cells harbouring pDsbA3-PI were grown 
at 25°C in shake flask cultures in LB or mineral 
salt medium containing glucose (10 g 1 l ) and 
ampicillin (100 ug ml- 1 )- The composition and 
preparation of the mineral salt medium has been 
described previously (Teich et al., 1998). After the 
cells reached an optical density (OD 500 ) of 1, 
IPTG was added to 1 mM, the culture medium 
was supplemented with medium additives and the 
pH value was adjusted by adding HC1 or NaOH, 
and the cells were grown for additional 6 h. 

The preparation of periplasmic fractions was 
performed by osmotic shock according to Kang 
and Yoon (1994). Generally, 100 ul of extraction 
buffer were added to a cell pellet of 1 ml broth at 
OD J00 = 1 . The soluble periplasmic proteins and 
the residual insoluble proteins were analysed by 
tricine-SDS-PAGE (13%). For immunoblot analy- 
sis, the separated proteins were electroblotted 
onto a nitrocellulose membrane (Pall), incubated 
with a monoclonal mouse antibody against hu- 
man insulin (Roche), and visualised by using the 
ECL detection system (Amersham). 

2.4. Analysis of native proinsulin in periplasmic 
fractions 

Trypsin converts the native DsbA-proinsulin 
fusion protein into DsbA (plus linker peptide), 
C-peptide, and human insulin. This human insulin 



contains an additional arginine residue at the 
C-terminus of the B-chain because further trim- 
ming by carboxypeptidase B digestion was omit- 
ted The cleavage procedure was performed at pH 
8.0 (10 mM Tris/HCl, 2 mM EDTA) on ice with 
different trypsin concentrations in the range of 
1-200 ug trypsin per 100 ul of the periplasmic 
fraction. Optimal release of insulin from the fu- 
sion protein was obtained by incubation with 100 
ug trypsin for 10 min (data not shown). Proteoly- 
sis was stopped by addition of soybean trypsin 
inhibitor (STI, Sigma) in a 10-fold molar excess 
over trypsin. The released insulin was analysed by 
a specific insulin-ELISA (Roche). It is a sand- 
wich-ELISA with matrix-bound monoclonal 
anti-human insulin antibodies and peroxidase- 
conjugated monoclonal anti-hurnan insulin anti- 
bodies recognising only native insulin or native 
proinsulin. Using a standard curve created with 
native insulin, the amount of proinsulin produced 
in E. coli was calculated. The concentration of 
native proinsulin obtained after cleavage with 
trypsin is given as mg native human proinsulin 
per g dry cell weight (mg g" ') or per liter cultiva- 
tion broth (mg 1~'). All experiments have been 
repeated at least twice and each sample was mea- 
sured in three to five dilutions. 

2.5. Purification of DsbA-proinsulin 

Periplasmic fractions from 0.5 1 of bacterial 
culture containing DsbA-proinsulin were adjusted 
to 1.2 M ammonium sulfate by gently adding 
ice-cold 3.5 M ammonium sulfate in 20 mM 
Tris/HCl (pH 7.4), incubated on ice for 30 min 
and centrifuged (48 000 x g, 4°C, 30 min). The 
pellet containing DsbA-proinsulin was resus- 
pended in 5 ml ice-cold 10 mM MOPS/NaOH pH 
7.0 (buffer A) and homogenised using an ultra- 
turrax for 2 min. After dialysis over night against 
buffer A, the protein fraction was loaded onto a 
DEAE-Sepharose ff column (Pharmacia, 5 ml 
column volume). DsbA-proinsulin was eluted in a 
linear gradient from 0 M (buffer A) to 0.5 M 
NaCl in buffer A over 10 column volumes. Frac- 
tions of 2 ml were collected and the protein 
concentration was determined from UV spectra 
[e(280 nm, M" 1 , 1 cm = 34 140]. Fractions of 
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each purification step were analysed by SDS- 
PAGE (12%). Trypsin digestions of purified 
DsbA-proinsulin were performed on ice at pH 
7.5 (8 mM MOPS, 2 mM Tris/HCl, 180 mM 
NaCl) with a protein concentration of 100 ug 
ml -1 and a mass ratio DsbA-proinsulin: trypsin 
of 10- 1. Digestion products were analysed by 
RP-HPLC and tricine-SDS-PAGE (18%). Re- 
versed phase HPLC was performed with a linear 
gradient from 20 to 50% (v/v) acetonitrile in 
0.1% TFA on a C 18 column (ET 125/3 NU- 
CLEOSIL 100-5 C 18 PPN, Macherey-Nagel) and 
the eluted digestion products were analysed by 
mass spectrometry. For SDS-PAGE analysis the 
digested fractions were precipitated by adding 
TCA to a final concentration of 15% (w/v), 
loaded onto tricine-SDS-gels (18%) under reduc- 
ing conditions and blotted onto nitrocellulose 
for immunoblotting as described above, or on a 
PVDF membrane for N-terminal sequencing. 



3. Results 

3.1. Expression of DsbA-proinsulin and 
determination of proinsulin 

For secretory expression of the human proin- 
sulin gene, we constructed the plasmid pDsbA3- 
PI (Fig. 1). Using this vector dsbA-proinsulin 
expression is controlled by a trc promoter and 
the fusion protein is translocated to the 
periplasm by the dsbA signal sequence. The fu- 
sion protein was quantitatively secreted to the 
periplasm as the precursor form could not be 
detected by immunoblotting (data not shown). 
Unexpectedly, only ~ 10% of the fusion protein 
were found in the soluble fraction, while ~ 90% 
formed insoluble periplasmic inclusion bodies 
(Fig. 2(A-B), lanes 2 and 3). After trypsin di- 
gestion of the soluble periplasmic fraction the 
concentration of the generated insulin was mea- 
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sured by an insulin-ELISA and correlated to 
proinsulin based on a standard curve. Trypsin 
digestion of the fusion protein proved to be es- 
sential for quantitative determination of proin- 
sulin, indicating that within the fusion protein 
the proinsulin part was not completely accessible 
to the ELISA antibody. 



3.2. Expression of DsbA -proinsulin in different 
host strains at different growth temperatures 

In order to achieve efficient production of the 
fusion protein in soluble form with a native 
proinsulin part, the expression was performed 
with E. coli SF131. In this strain, the genes 
degP, ompT, and ptr were deleted (Meerman 
and Georgiou, 1994). Ptr encodes Protease III, 
which has been shown to degrade insulin and 
other secreted fusion proteins. Thus, we as- 
sumed that in this strain proteolytic degradation 
of the soluble fusion protein should be lowered. 
LB was used as cultivation medium and the ex- 
pression was analysed at growth temperatures 
from 20 to 30°C over 24 h after induction. Cul- 
tivation at 25°C for 4-6 h was found to be 
optimal for the production of soluble fusion 
protein with native proinsulin part (data not 
shown). The maximum amount of proinsulin 
was ~ 0.17 mg g~\ corresponding to 0.56 mg 
g- 1 of the lull length fusion protein. Despite 
the use of the strain SF131 as a host, the yield 
of DsbA-proinsulin was still rather low. There- 
fore, we investigated expression of the fusion 
protein in other host strains, such as E. coli 
JM109, BL21, C600, and RB791. The yield of 
periplasmic proinsulin, detected after trypsin di- 
gestion (10 ng ml- 1 ) was highest in the E. coli 
strains C600 and RB791 (Table 1). In case of E. 
coli RB791 the cells grew much faster than 
C600 cells resulting in a higher cell density. 
Therefore, for further analysis, we compared the 
amount of soluble fusion protein produced in 
RB791 during growth at 25 and 37°C, respec- 
tively. The overall expression yield was highest 
at 37°C, but the highest amount of soluble fu- 
sion protein was found when cells were grown 
at 25°C (data not shown). 



3.3. Conversion of purified DsbA-proinsulin to 
insulin by trypsin 

In order to identify the digestion products of 
DsbA-proinsulin after trypsin treatment we 
purified the fusion protein from the periplasmic 
fraction (Fig. 3(A)). DsbA-proinsulin was first 
quantitatively precipitated with 1.2 M ammo- 
nium sulfate. The precipitate contained only two 
major impurities with an apparent molecular 
mass of ~ 43 and 50 kDa, which could be sepa- 
rated by ion exchange chromatography on 
DEAE-Sepharose. Further optimisation of the 
purification of DsbA-proinsulin was not at- 
tempted. Fractions containing homogeneous 
DsbA-proinsulin which consisted of ~10% of 
the total fusion protein were used for trypsinisa- 
tion. The overall yield of purified protein was 
~ 1 mg DsbA-proinsulin from 0.5 1 of culture 
medium (OD 500 = 1)- After trypsin digestion, the 
cleavage products were separated on tricme- 
SDS-gels (18%) or reversed phase HPLC. Under 
optimal cleavage conditions, the fusion protein 
was rapidly converted to insulin (Fig. 3(B)). Af- 
ter 1 mm, nearly the total fusion protein was 
digested and one major protein with an appar- 
ent molecular mass of 23 kDa appeared, which 



Levels of native DsbA-proinsulin in periplasmic extracts 
different E. coli strains 



E. coli host strains 


Amount of proinsulin" 




mg g _l mg l" 1 


SF131 


0.17 0.02 


JM109 


0.25 0.06 


BL21 


0.32 0.11 


C600 


0.83 0.24 


RB791 


0.83 0.29 



a Cells were cultivated at 25°C, harvested 4 h after induction 
and the periplasmic proteins were obtained after osmotic 
shock. After trypsin digestion (1 trypsin per 100 ul periplas- 
mic fraction), the released insulin was quantified by an msuhn- 
ELISA, which was used to deduce the initial concentration of 
proinsulin. The given yields correspond to mg or native human 
proinsulin per g dry cell weight (mg g~') and mg or natwe 
human proinsulin per liter or cultivation broth (mg 1 ), 
respectively. 
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Fig. 3. DsbA-proinsulin purification and trypsin digestion 
products. A: SDS-PAGE (12%), analysis of recombinant 
DsbA-proinsulin purified from the periplastic fraction of E. 
coli RB791. Soluble periplastic fraction (lane 1), supernatant 
after ammonium sulfate precipitation (lane 2), and pellet after 
dialysis against 10 mM MOPS/NaOH pH 7.0 (lane 3), purified 
DsbA-proinsulin after ion exchange chromatography (lane 4). 
For details see the methods section. B: Trypsin digestion 
products of purified DsbA-proinsulin were analysed on reduc- 
ing tricine-SDS-gels (18%). DsbA-proinsulin (20 ug) was incu- 
bated with trypsin in a mass ratio of 10: 1 for 0, 0.17, 1, 10, 
and 50 min (lanes 2-6); lanes 7 and 8, human insulin stan- 
dard; molecular weight markers (M) are indicated in lanes 1 
and 6 (A) and lane 1 (B). 

is further degraded to a protein with ~ 20 kDa. 
The proteins correspond to C-terminally trun- 
cated fragments of DsbA-proinsulin and DsbA, 
respectively, as revealed by N-terminal sequenc- 
ing. Simultaneously, a set of protein bands with 
an apparent molecular mass of 3-4 kDa, corre- 
sponding to the A- and B-chain of insulin and the 
C-peptide, could be observed. By reversed phase 
HPLC and mass spectrometry the digestion prod- 



ucts of DsbA-proinsulin were identified as the 
expected insulin with an additional arginine 
residue at the C-terminus of the B-chain and the 
C-peptide. 

3.4. Optimisation of the yield of native 
DsbA -proinsulin using medium additives 

To further increase the yield of soluble fusion 
protein with a native proinsulin part we investi- 
gated the effect of different medium additives 
during cultivation of transformed RB791 in LB 
medium. We chose pH values of the culture 
medium from 4 to 9 and L-arginine (0.1-0.5 M), 
reduced glutathione (2-10 mM), and ethanol (1- 
5% v/v) as medium additives. For L-arginine it 
has been shown that it increases the refolding 
yield of recombinant proteins in vitro (Rudolph et 
al., 1997). Reduced glutathione allows disulfide 
bond isomerisation in disulfide-scrambled folding 
intermediates in vitro and in vivo (Zapun and 
Creighton, 1994; Wunderlich and Glockshuber, 
1993; Akiyama et al., 1992). The addition of 
ethanol to E. coli cultures seems to induce the 
heat shock response and the production of molec- 
ular chaperones, and leads to an enhanced pro- 
duction of heterologeous proteins in E. coli 
(Kusano et al., 1999; Van Dyk et al., 1995). In 
addition, similarly to the in vitro folding of IGF- 
I, ethanol may influence the oxidative folding of 
proinsulin in the periplasmic space (Hejnaes et al, 
1992). Since the pH value is a key factor in 
disulfide bond formation, we also investigated the 
production of native proinsulin at different pH 
values in the cultivation medium. 

In all experiments E. coli RB791 was used as 
expression strain. Cells were cultivated at 25°C for 
6 h after induction, and DsbA-proinsulin in the 
periplasmic extracts was digested with trypsin. 
SDS-PAGE and irnmunoblot analysis showed 
that L-arginine increased the amount of soluble 
periplasmic fusion protein 3-4 fold at a pH of 
~ 6.5-7.0 in the medium (Fig. 2(A-B), lanes 2 
and 4). The ratio of soluble to insoluble fusion 
protein was -20:80% without additive and 
60:40% with 0.4 M L-arginine. The maximum 
yield of proinsulin was 8.9-9.2 mg g" 1 when 0.3 
or 0.4 M L-arginine were added to the cultures 
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(Fig. 4). Although both concentrations of L- 
arginine increased the cell-specific yield of proin- 
sulin to the same extent, the concentration of 0.4 
M had a more negative effect on cell growth (see 
Table 2). Concentrations above 0.4 M L-arginine 
resulted in complete growth inhibition (data not 
shown). Interestingly, the amount of soluble and 
correctly folded DsbA-proinsulin produced in E. 
coli did not depend on the pH of the medium in a 
range of pH 5-7. At pH 8-9, the growth of thecells 
was strongly inhibited and hardly any native proin- 
sulin was detectable in the corresponding samples. 

3.5. Optimisation of DsbA-proinsulin expression 
in mineral salt medium 

In order to analyse the influence of the medium 
composition on the yield of native fusion protein, 
cultivations were also performed in mineral salt 
medium. Generally, the yields of native proinsulin 
were ~ 10-fold lower compared to growth in LB 
medium and nearly all fusion protein seemed to 
appear in inclusion bodies. Supplementation of the 




Table 2 

Levels of the secreted native DsbA-proinsuhn in penplasmic 
extracts of E. coli RB791 harboring pDsbA3-PI i 
after growth in media with different supplements 



Fig 4 Influence of medium additives on the yield of native 
DsbA-proinsulin produced in E. coli RB791 with plasmid 
p DsbA3-PI. Native insulin was measured by an insulin-ELISA 
after trypsin digestion with 100 ug trypsin per 100 ul periplas- 
ms extract. Cells were cultivated in LB medium (panel A) or 
mineral salt medium (panel B). (A) supplementation of 0.3 or 
0.4 M L-arginine (bars 3-4) and (B) supplementation with 1% 
ethano! (bar 3) or 3% ethanol and 10 g 1 - 1 yeast extract (YE) 
(bar 4). The proinsulin content measured in cells grown in LB 
or mineral salt medium without supplement was set to 1 (bar 
2) and the yield (as mg g"') from the cultivations in the 
corresponding medium was related to that. As a control, E. 
coli RB791 transformed with a control plasmid lacking the 
proinsulin gene was grown in LB medium and mineral salt 
medium, respectively (bar 1). 



Medium/supplement 


Native proinsulin 3 




mgg 1 


mg 1 1 


LB medium 
Without additives 
0.3 M L-arginine 
0.4 M L-arginine 


2.7 
8.9 
9.2 


2.5 
2.5 
1.1 


Mineral salt medium 
Without additives 
1% (v/v) ethanol 
10 g 1 _I yeast extract 
3% (v/v) ethanol 4- 10 g 1 _1 


1.1 
2.6 
1.5 

yeast 1.6 


0.7 
3.1 
1.6 
1.8 


» The amount of proinsulin 


was determined c 


s described in 



Table 1. The results of one representative experiment are 
shown. 

cultures with L-arginine, reduced glutathione, ad- 
justment of the pH value and the combination of 
all parameters always resulted in lower yields 
compared to LB medium (data not shown). In 
contrast, the addition of yeast extract (5-20 g 1 ') 
and/or ethanol (1-5% v/v) to the mineral medium 
improved the yield of native fusion protein 2-3 fold 
to 2.6 mg g- 1 (Table 2). These results are in 
accordance with investigations of Ben-bassat et al. 
(1984) who found that the addition of yeast extract 
to the growth medium improve the yield of soluble 
heterologous proteins produced in E. coli. In our 
case, the amount of insoluble DsbA-proinsulin was 
similar under both conditions (cultivation ± etha- 
nol), but the ratio of soluble protein to total fusion 
protein was significant improved by the addition of 
ethanol (Fig. 2(A-B), lanes 8 and 10). Addition- 
ally, yeast extract or ethanol stimulated the growth 
of the bacteria leading to a higher final cell density, 
and thus to a higher amount of proinsulin per ml 
cultivation broth. Using this approach, the maxi- 
mal yield of native proinsulin within the fusion 
protein in cultures with 1% (v/v) ethanol was 3.1 
mg 1" l . This was the highest amount obtained in 
the whole optimisation series in mineral salt 
medium, although the specific yield in this case was 
only 2.6 mg g _1 (Table 2). 
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We studied the secretory expression of a dsbA- 
proinsulin fusion construct in E. coli in depen- 
dence on medium additives and cultivation 
conditions. The yield of native proinsulin ob- 
tained from the DsbA-proinsulin was 1000-fold 
higher than the secretory expression yield de- 
scribed for isolated proinsulin so far (Chan et al., 
1981). This indicates that DsbA is a very effective 
fusion partner for obtaining high amounts of 
soluble and correctly disulfide bridged proinsulin. 
Similarly, the fusion of the enterokinase gene to 
dsbA and its secretory expression resulted also in 
an increased yield of soluble protein in the 
periplasm of E. coli (Collins-Racie et al., 1995). A 
different approach to produce high amounts of 
proinsulin or modified proinsulin from a protein 
fusion has been described by Kang and Yoon 
(1994). The production of a ZZ-proinsulin fusion 
protein resulted in 0.8 mg 1" 1 proinsulin, which is 
four times less compared to our approach. How- 
ever, for a modified ZZ-proinsulin the yield was 
approximately five times higher then in our case. 
Although this modified fusion protein was not 
completely secreted to the periplasm the authors 
showed that the secretion efficiency of the short- 
ened protein was significantly increased compared 
to the full length protein. In our case the fusion 
protein was completely secreted to the periplasm. 
Possibly, the yield of DsbA-proinsulin could be 
further improved if the C-peptide is modified as 
described by Kang and Yoon. 

The use of different medium additives during 
growth of E. coli expressing the dsbA-proinsulin 
gene resulted in an up to four-fold increase of the 
amount of native proinsulin. Much higher yields 
were achieved using LB medium as compared to 
mineral salt medium, indicating that the effect of 
supplements on the amount of native fusion 
protein depends also on many other components 
in the medium. LB medium is a rich medium 
which has the advantage that the cells show a 
higher growth rate than cells growing in mineral 
salt medium. On the other hand, mineral salt 
medium is a defined medium, which is commonly 
used in high cell density cultivations. For LB 
medium, only the addition of L-arginine increased 



the amount of soluble fusion protein with native 
proinsulin part. In case of mineral salt medium, 
only ethanol or yeast extract had a stimulating 
effect on the yield of native proinsulin. Since 
L-arginine is known to be an additive which in- 
creases the solubility of proteins and stimulates 
the refolding of proteins in vitro, we investigated 
the ability of L-arginine in the medium to increase 
the yield of soluble DsbA-proinsulin which is 
properly folded and contains the correct disulfide 
bond pattern. Cultivation in LB medium in the 
presence of 0.4 M L-arginine increased the cell 
specific yield of native proinsulin -3.4 fold. Al- 
though this is a clearly positive effect, the results 
are low in comparison to in vitro studies on the 
refolding of other model proteins from inclusion 
bodies (De Bernardez Clark et al., 1999). Unex- 
pectedly, the addition of glutathione had a nega- 
tive effect on the yield of native proinsulin, 
although it had a positive effect in vivo and in 
vitro on the yield of BPTI, RBI, and alkaline 
phosphatase (Zapun and Creighton, 1994; Wun- 
derlich and Glockshuber, 1993; Akiyama et al., 
1992) This might indicate that formation of cor- 
rect disulfide bonds is not the main problem upon 
formation of native proinsulin in the fusion 
protein. Surprisingly, the supplementation of the 
medium with yeast extract and ethanol had a 
positive effect on the amount of native promsulm 
as well as on the cell growth in mineral salt 
medium. The amount of native proinsulin per liter 
cultivation broth was 2.5-5 times increased by 
yeast extract and ethanol, respectively. Ben-bassat 
et al (1984) reported that the addition of yeast 
extract to cultures increases the production of 
soluble heterologous proteins in E. coli. Ethanol is 
a trigger of the heat shock response in E. coli and 
induces stress response proteins including molecu- 
lar chaperones, such as both the DnaK-DnaJ- 
GrpE and GroEL-GroES chaperone machinery. 
It has been suggested earlier that the positive 
effect of ethanol on the expression of recombinant 
genes in the cytoplasmic space is connected to its 
stimulating effect on the heat shock response (Ku- 
sano et al., 1999; Van Dyk et al., 1995). However, 
ethanol may also stimulate the formation of na- 
tive proinsulin in the periplasma in an analogous 
way as shown for the in vitro folding of IGF-I 
(Hejnaes et al., 1992). 
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In conclusion, our data indicate that the yield 
of native human proinsulin produced in E. coh 
can be significantly increased by fusion of the 
proinsulin gene to dsbA and by supplementation 
of the culture with medium additives. We could 
show that L-arginine, ethanol, and yeast extract 
are substances which can increase the yield of 
soluble fusion protein with a native proinsulin 
part A great advantage of the fusion system is 
that the proinsulin part can be cleaved off from 
the DsbA part by trypsin which is commonly used 
for conversion of proinsulin into insulin, the ther- 
apeutically active form of this protein. By this 
proteolytic conversion we could directly show that 
the proinsulin part of the fusion protein was 
correctly folded. 
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